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0.  Introduction. 

The  problem  of  maintaining  a representation  of  equivalence  classes 
or  partitions  of  a set  arises  in  many  applications.  Aho,  Hopcroft,  and 
Ullman  [1,  Chapter  U]  have  called  this  the  UNION-FIND  problem,  and  they 
begin  their  exposition  by  introducing  the  following  simple  data  organization: 

Let  R[x]  be  the  name  of  the  equivalence  class  containing  element  x . 
Let  N[s]  be  the  number  of  elements  in  eqiii valence  class  s . 

Let  L[s]  designate  a linked  list  containing  the  elements  of  class  s . 
To  merge  disjoint  equivalence  classes  s and  t , where  N[s]  < N[t]  , 
set  R[x]  •-  t for  all  x in  L[s]  , append  L[s]  to  L[t]  , 
add  N[s]  to  N[t]  , and  call  the  new  equivalence  class  t . 

Initially  all  classes  have  size  1 , and  they  are  merged  into  larger  and 

larger  classes  as  the  algorithm  proceeds. 

This  strategy  allows  us  to  find  the  equivalence  class  containing  a 

given  element  in  constant  time;  and  the  cost  of  replacing  two  classes 

by  their  iinion  is  essentially  proportional  to  the  size  of  the  smaller 

class,  i.e.,  the  number  of  times  R[x]  is  changed.  If  there  are  n 

elements  in  all,  it  is  easy  to  see  that  R[x]  is  changed  at  most  Ig  n 
*/ 

times-'  for  each  x , since  the  class  containing  x must  at  least  double 
in  size  whenever  R[x]  changes.  Therefore  it  will  take  at  most  0(n  log  n) 
units  of  time  to  do  all  the  union  operations. 

In  this  paper  we  shall  prove  that  the  average  amount  of  time  to  do 
all  unions  by  the  above  method  is  only  0(n)  , thereby  establishing  a 
conjecture  of  A.  C.  Yao  [12].  The  probability  distribution  on  the  set  of 


possible  input  sequences,  which  leads  to  such  "average"  behavior,  can  be 
defined  in  several  equivalent  ways  corresponding  to  the  conventional  notion 
of  a random  graph;  in  essence,  the  probability  that  classes  s and  t will 
be  merged  at  any  particular  step  is  proportional  to  N[s]N[t]. 

Section  1 describes  a convenient  way  to  deal  with  lairge  random 
graphs,  by  analogy  with  the  treatment  of  large  systems  of  particles  in 
statistical  mechanics,  an  approach  which  was  first  suggested  by 
V.  E.  Stepanov  [10].  Section  2 develops  several  estimates  usefxil  in 
the  study  of  this  probability  model,  and  Section  5 explains  how  to  apply 
the  resulting  fomulas  to  the  above  algorithm.  The  proof  of  linearity 
is  completed  in  Sections  t,  and  6. 

Following  Yao  [12],  we  shall  call  the  above  algorithm  QFW  , for 

"quick  find  weighted";  one  can  quickly  find  the  equivalence  class 

containing  x by  simply  looking  at  R[x]  , and  the  class  sizes  or 

weights  N[s]  are  used  to  decide  how  the  updating  is  done.  QFW  is  a 

refinement  of  the  algorithm  QF  , which  dispenses  with  the  N[s]  table 

and  sin^jly  updates  one  of  the  two  classes  selected  arbitrarily.  In 

2 

Section  7 the  QF  algorithm  is  shown  to  require  «-  n /8  updates  on  the 
average.  Empirical  exj)eriments  on  QF  and  QFW  , confirming  this  theory, 
appear  in  Section  8. 

Section  9 discusses  another  probability  model  under  v^ich  we  might 
wish  to  study  the  average  behavior  of  QF  and  QFW  , based  on  the  hypothesis 
that  the  actual  unions  to  be  performed  take  place  in  random  order. 

Recurrence  relations  which  arise  in  this  model  are  studied  in  Sections  10, 

11,  and  12,  culminating  in  detailed  exact  or  asymptotic  calculations  of  the 
average  cost. 


j 
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Finally,  Section  I3  discusses  the  distribution  of  "union  trees" 
associated  with  equivalence  algorithms,  and  relates  such  trees  to  two 
other  algorithms  and  QMW)  described  by  Yao,  in  addition  to  QF 

and  QFW  . Several  open  problems  conclude  the  paper. 


h 


1.  Connectivity  of  Random  Graphs 


Let  us  imagine  that  each  of  the  (n^-n)/2  pairs  of  distinct  elements 

2 

[x, y}  has  been  associated  in  some  manner  with  (n  -n)/2  independent  equal-sized 


samples  of  some  radioactive  substance  like  raiium,  where  there  is  probability 
e ^ that  any  particiilar  sample  of  radium  has  emitted  no  a particles 


between  time  0 and  time  t . When  the  radium  associated  with 


fires  off  its  first  particle,  we  immediately  draw  a line  between  x and  y 


at  any  time  t > 0 the  lines  drawn  in  this  way  define  an  undirected  graph 


on  the  n given  elements. 

Let  probability  that  the  random  graph  defined  in  this 

way  is  connected  at  time  t ; thus  P (t)  is  an  increasing  function  which 


that 


Another  way  to  define  a random  graph  is  to  say  that  each  of  the 

2 

(n  -n)/2  edges  is  independently  present  with  probability  p and  absent 


with  probability  q = 1-p  ; then  P (t)  is  the  probability  of  connectedness 


This  definition  was  introduced  by  Gilbert  [ 5 ],  who 


shall  see  that  Stepanov' s 


physical  interpretation  tends  to  be  more  suggestive  in  developing  the 


theory 


Incidentally; 


may  be  regarded  as  a generating  function  for 


two  types  of  discrete  quantities  associated  with  random  graphs:  If  C(n,m) 


denotes  the  number  of  connected  graphs  on  n labeled  vertices  having  m 
edges,  we  have 

(1.1)  ^ C(n,m)(l-e"‘^)“ 

m > 0 

= E C(n,m)(e^-lf  ; 

m > 0 

and  if  A(n,m)  denotes  the  number  of  ordered  sequences  of  edges 

[Xi,yi],  {X2,y2}^ defining  a connected  graph,  where  \ 

but  duplicate  edges  {x.,y.  } = {x.,y.}  are  allowed,  we  have 

J J 

(1.2)  = e ^)VP  2)  A(n,m)t’^/ml  , 

m>0 

k 

since  e t /kl  is  the  probability  that  a given  edge  has  "fired"  exactly 

k times.  The  sum  in  (1.1)  can,  of  course,  be  restricted  to  the  range 
2 

n-l  < m < (n  -n)/2  , since  C(n,m)  = 0 when  m < n-1  ; similarly,  we  can 
replace  " m > 0 " by  " m > n-l  " in  (1.2). 

It  is  easy  to  compute  the  functions  Pjj('t)  for  n = 1, 2,  ...  by  using 
the  recurrence  formiila 

this  formiila  follows  immediately  frcm  the  fact  that  the  k-th  term  of  the 
Slim  is  the  probability  that  a particular  point  x is  connected  to  exactly 
k points  (including  itself)  at  time  t . Identity  (l.j)  has  a remarkable 
corollary, 

(1-M  = (i+z)"“^  , 

which  holds  for  all  z ; the  coefficient  of  z“  on  the  lel't-hand  side  of 
(l.U)  can  be  shown  to  equal  the  coefficient  on  the  right,  using  (1.5). 
Stepanov  f9]  discovered  two  nonlinear  identities 

(1.5)  P (t)  = E r jjli  )Pj^(t)P^_j^(t)(e"^('"‘^‘^)^-e‘^^""^)^)  , 

k 1 ' 
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(1.6)  p;(t)  = 


C>1  '■ 


for  which  he  gave  rather  lengthy  algebraic  and  analytic  proofs.  His 
first  formula  can  be  proved  more  directly  by  observing  that  the  k-th 
term  in  the  sum  is  the  probability  of  a connected  graph  in  which  a 
particular  point  x would  be  connected  to  exactly  k points  if  another 


particular  point  y were  removed.  There  are 


(i::n 


ways  to  choose 


the  k-1  other  points,  and  the  graph  restricted  to  x and  those  other 
points  must  be  connected,  as  must  the  graph  restricted  to  the  remaining 
n-k  points  including  y ; and  there  must  be  at  least  one  edge  from  the 
k points  to  y , but  none  from  the  k points  to  the  remaining  n-l-k  . 
Stepanov's  second  formula  can  be  proved  by  noting  that  P^(t)dt  is  the 
probability  that  the  graph  becomes  connected  at  time  t (i.e.,  between 
t and  t + dt  );  this  is  the  number  of  ways  to  choose  an  edge  {x,y}  , 
times  the  number  of  ways  to  divide  the  n points  into  a set  of  k elements 
containing  x and  a set  of  n-k  elements  containing  y , times  the 
probability  that  the  k points  and  the  n-k  points  are  already  connected, 
times  the  probability  e ^dt  that  the  edge  [x,y]  has  just  "fired",  times 
the  probability  that  the  other  k(n-k)-l  edges  between  the  two  sets  have 
not  yet  fired. 

Incidentally,  also  relevant  to  random  directed  graphs  on 

n vertices:  If  each  of  the  n possible  arcs  (x, y)  is  independently 

present  with  probability  1-e  ^ , then  Pjj('t)  is  the  probability  that  a 
particular  vertex  x is  a "root",  i.e.,  that  there  is  an  oriented  path 
from  X to  all  other  vertices.  Perhaps  the  simplest  way  to  prove  this  fact 


is  to  show  that  the  stated  probability  satisfies  recurrence  (1.3). 
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Bounds  on  the  Probability  of  Connectedness. 


If  we  set  z = in  (1.4),  we  find 

..1)  p„(t)  . - r (";0v"x 

1 < k <n  ^ 


-kt  -nt^n-k 
e - e ) , 


hence  (cf.  [10]) 


(2.2)  P^(t)  < (1-e"'^^)''"^  . 


In  fact,  a similar  argument  proves  the  sharper  upper  boimd 


P„(t)  i 


but  we  will  not  need  this  improvement.  When  t is  large,  the  bound  in 

(2.2)  is  very  good  because  the  correction  terms  dropped  from  (2.1)  become 
exponentially  small;  but  when  t is  near  zero,  we  can  squeeze  another 
factor  of  n out  of  the  upper  bound,  since  (cf.  [11,  p.  228]) 

(2.3)  Fj^(t)  < n'^"2(l-e"^)''-^  . 

This  formula  follows  because  a connected  graph  must  contain  a spanning  tree  as 
a subgraph;  there  are  n^"^  spanning  trees  on  n labeled  points  and  (l-e"^)*^"^ 
is  the  probability  that  any  paxticular  spanning  tree  is  present.  A simple  lower 
bound  for  obtained  by  considering  only  the  term  for  m = n-1 

in  (1.1): 

(2.4)  F^(t)  > n (1-e  ) (e  ^ ^ / ) 

Relations  (2.3)  and  (2.4)  combine  to  give  the  formula 

(2.5)  F^(t)  = n''"^  t'"’^(l-0(n^t))  . 

(Here  and  in  the  sequel  wo  shaJ.1  use  0 notation  to  stand  for  functions  bounded 
by  absolute  constants,  depending  only  on  specified  conditions.  For  example, 
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2 

in  (^.5)  the  0(n  t)  stands  for  any  function  of  n and  t vtiose  absolute 

2 

value  is  at  most  Cn  t for  some  C , when  n > 1 and  t > 0 . ) 

We  shall  be  especially  concerned  with  values  of  for  t « l/n  , 

and  the  upper  bound  (2.2)  shows  that  Pj^(t)  is  exponentially  small  in 
this  critical  range.  In  order  to  understand  more  easily  what  is  going 
on,  let  us  magnify  the  values  by  defining 


-nt\n-l 


(2.6)  cD^(t)  = P^(t)/(l-e"’^^) 


If  we  apply  formula  (1.6),  together  with  formula  (1.5)  both  as  it  stands 
and  with  k replaced  by  n-k  , we  obtain 


(2.7)  O)^(t)  = ((l-e"’"'^)P^(t)  -n(n-l)e-’^\(t))/(l-e-'"^) 


-nt\n 


2 


n(n-l) 

2 


_-k(n-k)t/,  -nt  -nt,  kt  , (n-kH 
— ;; -T-  e (1-e  -e  (e  -1+e^  ^ -1)) 


hence  ^j^('t)  satisfies  a surprisingly  simple  differential  difference 
equation  (cf.  [10]): 


(2.8)  cu.(t)  = is(")l«),(t)(„-k)a.^_^(t)| 


sinh[kt/2]  f sinh( (n-k)t/2 ) 
sinh(nt/2)  j sinh(nt/2)  J 


It  follows  in  particular  that  ‘^jj(’*-)  is  monotone  increasing.  Our  bounds 
on  Pjj(i)  iwiply  that 


(2.9)  oj^(t)  = i (1+  O(n^t)) 


for  t = 0(n"^)  ; 


(2.10)  i c cn^(t)  < 1 . 
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Wo  can  also  obtain  a recurrence  for  ^^(t)  analogous  to  (l.J)  and  (2.7), 
using  (l.i+)  vd-th  z = -e  : 


^-k(n-k)t^  l-e~^^ 
Vl-e'”^ 


,k-l, 


1-e 


- (n-k)t 


,n-k 


1-e 


-nt 


= 1 


We  shall  make  several  uses  of  the  following  estimate  for  '^^(t)  , which 


is  of  particular  interest  when  t < n 


■3/2  . 


Lemma  1.  %('t)  < ^ exp(cn^/^t)  , where  c = ^/^t/8  « .62666 


Proof.  It  is  easy  to  verify  that  sinh(at)/sinh(bt)  < a/b  when  0 < a < b 
and  t > 0 , hence  (2,8)  implies 

( )k%(t)(n-k)Q_Jt)(  £ 

k 


(2.12)  co'(t)  < |r(;;)fc%(t)(n-k)“„.^(t)(|)‘'(2^) 


Note  that  equality  holds  when  t = 0 . Let  us  now  consider  the  quantity 

which  appears  in  this  sum.  Since 

In  nl  = n In  n - n + In  V2^  + J t"^h(t)dt  , 


where  h(t)  = | {t}[l-t}  , we  have 


hence 


n CO 
k n-k 


t"^h(t)  dt  ; 


I l«(K.n)  < /i  Z ~k= 

^^0<k<n  Vk(n-k) 


0 < k <n 
1 


“(2’  0 ■ -f? 
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~ . Connection  to  the  Equivalence  Algorithm. 

When  the  radium  associated  vdth  edge  emits  an  a-particle,  we 

can  imagine  invoking  the  equivalence  algorithm  at  that  instant,  merging 
classes  R[x]  and  R[y]  if  they  are  distinct.  Then  the  equivalence 
classes  at  any  time  will  be  the  same  as  the  connected  con^jonents  of  the 
random  graph.  The  probability  that  two  edges  fire  simultaneously  is  zero; 
and  as  t ->  a>  the  graph  becomes  connected  with  probability  1 . In  effect 
we  are  considering  a random  execution  of  the  equivalence  algorithm  where  the 
classes  to  be  merged  at  each  stage  are  selected  by  choosing  uniformly  among 
all  pairs  (x, y)  of  elements  that  are  not  already  equivalent.  This  seems 
to  be  the  most  natural  way  to  define  the  average  behavior  of  the  process. 

When  R[x]  is  a class  of  size  k and  R[y]  is  a class  of  size  m , 
let  us  say  that  the  algorithm  does  a (k,m)  -merge;  the  cost  of  such  a 
merge  is  min(k, m)  . Therefore  the  average  running  time  to  do  n-1  ijnions 
which  connect  the  graph  is 

(3.1)  i;  min(k,m)E  , 

1 ^ V ™ ^ r,  n,  K,  m 


where  EL  , is  the  average  number  of  (k, m)  -merges  performed.  In  more 
n 

intuitive  terms,  the  average  n\jmber  of  times  the  firing  of  an  a-particle 

causes  a component  of  size  k to  be  joined  to  a component  of  size  m is 

E , + E , , when  k m . 

n,  k,  m n,m,  k ^ 

Given  any  fixed  way  to  partition  the  n elements  into  sets  (A,B, C) 
of  respective  sizes  (k, m, n-k-m)  , the  probability  that  the  random  process 
will  at  some  time  do  a (k, m)  -merge  with  A and  B as  the  respective 
classes  is 

(3.2)  i f (t)l  d(l-e-^S  , 

w K rn 
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since  1-e  is  the  distribution  function  for  the  firing  of  at  least 

one  of  the  km  edges  between  A and  B , while 

Fj^(t)P^(t)e~^^^"'^  ^ probability  that  A and  B are 

internally  connected  but  not  joined  to  C at  time  t . (The  factor  1/2 
in  (3.2)  accounts  for  the  probability  that  x instead  of  y belongs  to 
class  A when  the  edge  {x, y}  fires,  since  we  may  regard  (x, y)  and 
(y,x)  as  equally  probable.)  By  considering  all  possible  choices  of  A , 
B , and  C , we  have 


(5.3) 


E 


, „ . ;; , , r"  pjt)p  (t)kme-“  at 

n,k,m  2*k:m;  (n-k-m):  k''  m'' 


For  example,  consider  the  simplest  case  k = m = 1 : The  expected 

number  of  times  we  form  a class  of  size  2 is 

^5-^)  ^n,l,l  ' J dt  = n(n-l)/(l+n-6)  n/k  . 

It  follows  that  about  n/2  singletons  are  built  into  pairs,  while  the 
other  n/2  elements  begin  their  interactions  by  being  hooked  to  larger 
components. 

When  k and  m are  fixed,  we  can  deduce  the  asymptotic  behavior  of 

E , as  n - = by  using  only  the  comparatively  weak  estimate  (2.5), 
m 

since  the  important  contribution  to  the  integral  occurs  when  t is  very 
small.  Let 

(5.5)  I - k+m  ; 

then 

^i,k,m  = ? (f)(k)  -f  t^'^(l-0(k2t))(l-0(m2t))e'('"^-^  dt 
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!t.  ireparations  for  the  Estimations. 

Our  main  goal  is  to  prove  that  the  sum  (j.l)  is  0(n)  , and  since 

b-  does  not  seem  to  have  a simple  formula  we  must  content  ourselves 
n j xvj  In 

with  approximate  values. 

Stirling's  approximation  applied  to  (3.6)  indicates  that  we  might 
expect  the  estimate 


to  be  vailid.  If  (U.l)  could  be  proved,  we  woixLd  be  done,  since  it  implies 
that 


i^.2) 


T min(k,m)E  , < D k(E  , + E , ) 

Kk,m<n  - Kk<m<n 


l<k<m<n  \_k  / m^y  1< 


Z 0 

m <n 


n m 


1/2 


= 0(n)  . 


m 


Actually  (^.1)  is  not  true  when  k = 1 and  m = n-1  , as  we  shall  see 
later;  however,  the  methods  we  shall  discuss  below  are  strong  enou^  to  prove 
(i.l)  in  the  special  cases 

( U . 3 ) k,  m < n^/^  or  k,  m > n^^^ 

Fortunately  this  suffices  to  prove  the  desired  result,  since  the  "uncontrolled" 
terms  have  a sum  bounded  by  n : We  have 


Z k(E  , + E , ) < n 

n,k,m  n,m,  k"^  - 


1 <k  <n 
n^/^  <m  <n 


since  the  left-hand  side  is  less  than  the  average  number  of  times  the  QFW 
••jJ-gorithm  changes  R[x]  while  including  x for  the  first  time  in  a class 


ft 

•V 


I 

i i 

. I 

« M 


>• 


5 . histiniate  of  the  Integral. 

Using  the  identity 

1 -X-,Ot 

(5.1)  1-e  =afte  dx^ 

repeatedly  in  {h.G),  we  can  express  l(k,m, w)  in  the  form 

CO  1 1 

, k-1  m-1  f (>  r*  ^k+ra-2  , , ^ , 

^ J j • • • J ^ exp(-wt-k(x  +...+X  , )t-m(y^  + ...+y^  ,)t)dxdydt  , 

0 0 0 ^ 

— ^ / 

k-1  + m-1  times 

where  dx  = dx^^  . . . and  dy  = dy^^  . . . dy^  ^ . Hence 

(5.2)  I(k,m,w^  = k^"\"'‘^(k+m-2);  f . . . f , 

0 0 (w+ k§ 

vdiere  5 - ^ ^1  ^ ^ ^m  1 ' translate 

the  domain  of  integration,  writing 


(5.3) 

l(k,  m,  w)  = k^~^m"’"^(k+m-2 

+1/2  +1/2 

(5.1) 

w)  zz  1 • • • P 

’-1/2  -1/2 

dxdy 


.k+m-1 


We  wish  to  estimate  j(k,m,w)  , but  first  let  us  try  the  same  kind 
of  operations  on  a similar  but  cimjjler  integral 


-<rt,.k-l  -wt 

(1-n  ) e dt  = a (k-1);  ...  I 

•\)  <1  . .1  . 


dx 


-1/2  '-1/2  (w  + a(k-l)/2  + d§)' 

:ince  th'.'  integral  in  this  face  can  be  evaluated  exactly  as  a Beta  integral, 
-it > k-1  -wt 


J (1-'^)*  -e 


1/ 


we  have  derived  the  rather  remarkable  formula 


(5.5) 


Incidentally,  (5.5)  may  be  regarded  as  a consequence  of  the  considerably 
more  general  identity 


(5.6) 


A^f(w)  = Z Q )(-l)''"'^f(wfj) 


f ...  r f^’^)(w+t^+...+t  )dt-...dt 
•^0  0 ^ ^ 


used  in  interpolation  theory. 

Equation  (5.5)  can  be  used  to  estimate  (5.^).  First,  since  the 
logarithm  function  is  concave  (ln(x+ty)  > (l-t)  Ig  x + t lg(x+y))  , 
we  have 

(k+m)  ln(w+k5  + mTl)  > k ln(w+ k^)  + m ln(w+ k|  + (k+m)Tl)  ; 

hence 


(5.7)  J(k,m,w)  < J 


...  dx  1/2 

-1/2  -1/2^  (w+  kg)^^-l/2 


•i* 


1/2 


dy  (w+ke^-'  mt,) 


1/2  (w+ k^  + (k+m)Tlj 


m 


k-1 


m-1 


n^/^  ...  pV2  dx  

-1/2  -1/2  (w+  k§)^^  w+k?  - (k+m)  w+  k§+  (k+m) 


Secondly,  since 
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\ In  u du  = In  X + 0(x”‘^) 

'x  - 1/2 

for  X > 1/2  , we  have 

(5.8,  I,((v-^)(v.!^)...(v.^)) 

v+  m/2 

= J In  u du  + 0(1) 

V - m/2 

= m In  V - f(m,v)  + 0(l)  , 

where 

(5.9)  f(m,v)  = + + 

= ^ 
is  a convergent  series  provided  that  m < 2v  . Therefore  (5.7)  yields 


J(k,m,w)  < 0(w+ k^  + m^)  J‘  ^ — expf  f fm  , 

-1/2  -1/2  (w+k§)^^""  V ' k+m  JJ 


< 0(„+k=*m=)/^^.. Si' 

-1/2  --1/2  (w+ke)^^"'  k k+m 


Again  we  can  use  concavity  of  the  logarithm  to  conclude  that 
(k+m)  ln(w+k§)  > m In  w+k  ln(w+  (k+m)§)  . 

Using  (5.5)  again, 

J(k,m,w)  < exp(f(m,  (w  - k(k-l)/2)/(k+m) ) ) 

w”^  (w-  (k+m)(k-l)/2)  ...  (w+  (k+m)(k-l)/2) 

w ^ ^ 


0(w+  k^  + m^) 


r.r 


w - kfk-ll/2  N / 


The  only  hyi^othecis  we  have  required  is  that  k < 2z  when  f(k,z)  is 
to  be  evaluated.  We  can  therefore  state  the  result  of  our  calculations 
as  follows. 


Leinma  2.  If  k < m and  m(k+m)  < 2w+  m(m-l)  , we  have 


l(k,m,  w)  < 0 


, k-1  m-1/,  „ > . 

k m (k+m-2)! 


(w+  k(k-l')/2  + ra(m-l)/2) 


k+m-1 


w + m(m-l 
k+m 


w+  k(k-l)/2  + m(m-l)/2 


Theorem  1.  The  average  time  for  the  QFW  algorithm  to  do  its  set  unions 
is  0(n)  . 


Proof.  Let  k and  m satisfy  (*+.3)  k+m  n ; we  may  assume  that 

k 2;  m . Let 

( .1)  w = (k+m)n  - (k+m)^  + km  - , 


:o  that 


(•  .2'. 


'n,  k,  m — klml(n-k-m)j 


7 l(k,m.,  w) 


V/e  vrj  sh  to  aj  ply  Lemma  2 to  estimate  l(k,  m,  w)  ; so  we  must  check  that 
m(i:+m)  2w  t- m(m-l^  , l.e., 

(-.3'!  2c(k^/^  + ''  < 2(k+m)  (n-k-m)  + (k-l)m 

2.  hi 

If  k _ m _ n ' ' this  certaanly  holds  for  all  sufficiently  large  n ; and 

1/2 

when  n'  lnn<k_-m  we  obtain  (6.3)  for  all  large  n by  the  estimates 

2c(k^^“^  + m'^^‘“ ')  < l+cm'^^*^  < m^^^  In  n - m < In  n - l)m  < (k-l)m  . 

P /3 

(We  really  only  need  to  consider  k n ' in  this  argujuent,  but  the  more 
general  estimate  will  be  useful  in  the  proof  of  Theorem  2 below. ) 

In  order  to  simpll  fy  the  formulas  obtained  after  app'lying  Lemma  2 in 
(■'.2),  we  sh'ill  write 


(■'.!)  y ==  n - (k+m-l)/2  , 


" ( ” ' ( ) ' ( “ ))/ ’ 


noting  tiiat. 


- 1 

t ! 

f ) 


t , 

K: 


y = z + 1 + c 


k5/2 1 ’'/2 


OT 

k+m 


z + 1 + c 


'/in  . 


The  factor  nl/(n-k-m)J  in  (6.2)  can  be  rewritten  as 

(y-  (k+m-l)/2)(y  - (k+m-5)/2)  ...  (y  + (k+m-l)/2)  = 0(y^'^®  e-f(k+ni, y ) , 
hy  (5.8);  hence  (6.2)  and  Lemma  2 imply  that 

) p J m”'~^(k+m-2):  y^^”^  Vf  (m,  z-k(k-l)/2(k+m)  )+f  (k.  z ' -f  (k^m. 

I k:m;(k+m)*^'""^-V^'"-i  J 


= 0 


k5/2  „,5/2(j^,^)1/2 


where 

(6.")  R = 


2~[k+m]  ) ^‘(^>2)  - f(k+m,y)  + O^m  log  | ^ 


The  proof  of  Theorem  1 will  be  complete  if  we  can  shew  that  R is  bounded 
above,  since  we  have  already  noted  that  Theorem  1 follows  from  (l+.l)  under 
condition  (^.5). 

Relations  (6.4),  (o.5)  make  it  clear  that  z > n/j  for  all  large  n , 

hence 


(■'.3) 


( 1/2 

V I m ' 

i = 1+  0l  

z v n 


Furthermore  it  is  clear  from  (5.9)  that 
f(m,v+d)  = f (m,  v)  + 0(md/v)  , 


and  that 


Let  us  set 


'(k+m,y)  - f (k,y)  > f (k+m,  u)  - f (k,  u)  when  y<u 


('.9)  u 


k+m  /"  k(k-l)  \ 
m ^ ^ 2(k^mj  J 
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Then  y u < 2y  , and  we  can  sirgilify  R as  follows; 


Since 


(o.lO)  R=  f(;a,y  - |[g})+o(^2^)+f(k,y)  + o(i2!^)-f(k+m,y)  + o(^ 


(6.11)  r^m,  ^ u^  + f(k,u)  - f(k+m,u) 


T,  i ^ (m(k+m)^‘^  + - (k+m)^^^^l 

j>l  2J  (2J+1)  (2u)"^J 


kf  (k+m)^'^  - k^-^' 


j>l  2j(2j+l)(2u)^'^ 


< - 


2 " 


R is  surely  bounded  when  k < m - n ■ . On  the  other  hand  when 


■;  k < III  , let  g(n)  = ; then 


- ^-2  \ n 

9on  ' 


= - ^ g(n)^/^  + 0(g(n)) 


is  less  than  some  absolute  constant. 


The  above  proof  of  Theorem  1 shows  that  E , is  exponentially  small 
•«hen  k > and  also  in  certain  other  cases  (e.g.  k = n^'^^  ^ ^ , 

V.  - ^ ) . Thus  it  is  rare  to  merge  two  large  classes;  one  way  to  state 


tl-iis  is 


Theorem  2 


The  probability  that  the  equivalence  algorithm  merges  two 


7 . The  Unweighted  Algorithm. 

If  the  QFW  algorithm  had  not  used  the  array  N[s]  , so  that  unions 
would  be  done  by  renaming  the  elements  in  the  larger  class  with  probability 
1/2  , the  average  running  time  would  be  significantly  greater.  Let  ^ 
be  the  average  number  of  equivalence  classes  of  size  k formed  during  a 
random  execution  of  the  algorithm,  i.e.,  the  average  number  of  components 
of  size  k which  appear,  as  the  edges  of  the  random  graph  appear  in  random 
order.  The  average  running  time  of  the  "unwei^ted"  algorithm  can  be 
expressed  as 


(7.1) 


E kE 
Kk<n 


'n,k  ’ 


since  the  elements  of  each  component  of  size  < n have  a 50-50  chance 
of  being  renamed. 

As  in  Equation  we  can  write  down  an  integral  for  E , , this 

time  more  easily  than  before; 


(T-2)  ‘ {l)( 


dt  . 


We  can  now  argue  as  before  to  obtain  satisfactory  estimates  of  E^^  when 

2/3 

k < n ' or  when  k is  sufficiently  large; 


Theorem 


where  c is  the  constant  of  Lemma  1; 


> for  n > k + c VlT  , 


J 
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<»)  'i.A  ' ^ T '"n-«„-k)*o(^)  for  e<t/n<l, 

*/ 

where-'  the  constant  implied  by  the  0 may  depend  on  e . 


Proof. 


Since  ^(t)  > l/k  we  have 

- ( J ) ¥ / (I-*)’"'  i , 


-kt 


on  setting  x = e and  using  well  known  properties  of  the  Beta  function. 
The  upper  boimd  follows  in  a similar  manner, 

sk^'^^t  - k(n-k)t 


®n,fc  £ (S)(, 

^n  \ n-k  vk-1  n-k-c^/k-l 

-Uj  — * 


dt 


dx 


_n 

2 


(n~l)(n-2)  ...  (n-k) 


k (n-c^/ic -1)  (n-c<yk -2)  . . , (n-c-^k -k) 


< -p  expl  c 
k 


r_^ 

I n-c  Vk  -1 


+ . . . + 


n-c  Vk  -k 

V 

since  x/(x-y)  < . 

To  prove  (b)  we  use  Stepanov's  theorem  [10]  that 

(7.3)  %(t)  = (1- (l  + nt)e“''^)(l  + o(l)) 

uniformly  for  t > Yq/^  > careful  anailysis  of  his  proof  we  can  replace 
the  o(l)  term  by  0(log  n/n)  , where  the  constant  implied  by  this  0 


H = E 


Kk<n 


l/k  . 
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^ ■ 


depends  on  . Thus 


I i 


^n,k  = (^)j”  (l-(l-lct)e-^^)(l.e-'^")^-V-k)e-^(^-^)"dt^^^ 

+ ( ^ ) O^J  (l-e"^)^"\(n-k)e'^^*^"^-)^dt  ^ 

= ^^)(n-k)J  (1  - (1  - In  x)x)(l-x)^~^x^"^~^dx^l+ 0^ 

+ 0 ^ j (n-k)  ^ (l-x)^"^x^"^'^dx  ^ 

where  1-Zq  = exp(-yQ)  . The  latter  integral  is  clearly  less  than  , 
and  by  choosing  Zq  sufficiently  small  as  a function  of  e we  can  ensure 
that 

k ^ en  , -n 

^0  ^ ^0  = 5 ; 

this  is  small  enough  to  wipe  out  the  contribution  from  ( k j(n-k)  , so 
the  correction  term  is  negligible.  The  first  integral  is 

/,  sk  n-k-1,  , xk-1  n-k  , 

J (1-x)  X J (l~x)  X In  X dx 

0 0 

k:(n-k-l):  (k-l)J(n-k)’  ^ 1 , 1 , 

n:  ■ n:  \^n  n-1  •••  n-k+1  } * ^ 


Part  (a)  of  this  theorem  implies  that 


(7.1+) 


n,k 


l-2a  „ , a ^ ^ 

~ n for  k = n , a < 2/3  ; 


this  is  rather  striking  when  1/2  < a < 2/3  , since  it  approaches  n 


-1/3 
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Apparently  the  components  of  a random  graph  tend  to  grow  very  rapidly  once 
they  get  to  this  sise  range,  they  must  move  quickly  past  such  valuer  of  k 


The  approximation  for  E , in  part  (b)  of  the  theorem, 


^ ¥ ("n  - “n-k>  ' 


has  the  right  order  of  growth  when  k = , but  it  has  been  proved  only 

for  k > £n  , 

At  any  rate  we  can  determine  the  asymptotic  value  of  (7.I)  without 

knowing  too  much  about  in  the  middle  range  of  k . The  sum  of 

2 

k < £n  is  at  most  en  , since  it  is  obvious  that 
Ej^  < l_n/kj  for  all  k . (All  con^jonents  of  size  k formed  during 
the  algorithm  are  disjoint,  so  there  are  never  more  than  them. 


p 

The  sum  of  kE  , for  k > £n  differs  from  n /h  by  at  most 
ic  ~ 


en  + 0(n  log  n)  , since 

^ ("h-Vk”  = 1(2) 


and  each  term  in  this  sum  is  less  than  n . Thus 

(7.6)  (|-6)n2  < I S kE  < 

' lck<n  ' 


for  all  6 > 0 and  all  sufficiently  large  n ; the  rtinning  time  is 
2 

asymptotically  n /8  , a factor  of  order  n times  what  it  was  in  the 
weighted  case.  It  is  tempting  to  conjecture  that  a stronger  result 
actually  holds,  namely 


T.  kE, 

Kk<n 


'n,  k 


1 

= E 


In  n + 0(n) 


(7.7) 


1 

2 


since  kE  , ^ (2/5 )n  In  n . 

l<k<n^/^ 

A comparison  of  formulas  (5.5)  and  (7.2)  shows  that  E , = 2E  , , , 

' ^ / n, n-1  n, 1, n-1  ’ 

and  indeed  this  relation  is  obvious  by  the  nature  of  the  equivalence 
algorithm,  since  any  ccogDonent  of  size  n-1  must  be  merged  with  the 
remaining  singleton  element.  Theorem  5 (t>)  now  yields 


(7.8) 


^n,  1,  n-1  2 ^ 


hence  (4.1)  does  not  hold  in  general. 
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8»  Numerical  Results. 

Some  Monte  Carlo  experiments  were  made  to  test  the  above  theory; 
for  each  value  of  n , random  edges  {x,y]  were  generated  \intil  the 
corresponding  graph  was  connected,  and  this  process  was  repeated  ten 
times.  Here  are  the  results  (with  "t  " indicating  one  unit  of  standard 
deviation) : 


n 

Observed  cost,  QF 

^ n^  + i n In  n 

Observed 

cost,  QFW 

2 

1.0  t 

0.0 

0.96 

1.0 

+ 

0.0 

4 

I+.3  t 

0.1 

3.85 

+ 

0.2 

8 

15.7  t 

0.3 

13.5 

8.5 

+ 

0.2 

l6 

50.8  t 

2.4 

46.8 

20.2 

+ 

0.8  1 

32 

178.4  + 

5.7 

165.0 

45.6 

+ 

0.9 

64 

658  t 

19 

600.7 

99.0 

+ 

1.9 

128 

2575  t 

71 

2255.0 

212.5 

+ 

4.4 

256 

8609  + 

153 

8665 .2 

451.2 

+ 

7.7 

512 

55938  + 

590 

33832.7 

936 

+ 

13 

1024 

155012  t 

972 

133437.9 

1941 

+ 

15 

2048 

532657  t 

5969 

529493.1 

3955 

+ 

4096 

2150655  t 

11233 

2108508.5 

7927 

+ 

49 

Note  that  the  values  in  the  unwei^ted  case  conform  well  to  the  predicted 
asyn5>totic  behavior,  and  the  values  in  the  wei^ted  case  seem  to  be  less 
than  1.95n  . 

For  small  n it  is  possible  to  calculate  exact  values  without  great 
difficulty;  e.g,,  when  n = 4 we  readily  find 

\l,l  “ 5 ’ \l,2  “ \l,5  “ 5 " %2,2  = 5 ^ 

hence  the  true  average  costs  of  the  unweighted  and  weighted  algorithms  are 
respectively  4.4  and  5*2  . 


1 


50 


When  n c 8 

the 

E.,  1,  „ values  are 

HI 

respectively 

m B 1 

m = 2 

m = 3 

m = 4 

m = 5 

m = 6 

m = 7 

, , 28 

k = 1 ^ 

28 

51 

60 

209 

5096 

24635 

3046 

15249 

168 

715 

1929822 

5511.755 

k = 2 

2 

11 

I2S5 

13054 

167739 

66958 

338695 

k = 3 

292 

TIW 

9472 

187475 

214482 

5511735 

k = 4 

50881 

937365 

and  the  average  costs  are  respectively  I6290696/IO625I+7  15 .3  and 

12265252/141^8655  *^8,47  • Except  for  the  fact  that  the  denominators 
6ire  con5)Osed  of  smeill  prime  factors  (e.g.,  IO62547  = U*13*17*19.25  ), 

there  appears  to  be  no  simple  pattern  to  these  numbers,  (it  is  easy  to 
bo\aid  the  size  of  the  prime  factors  by  proving  that  2((k+m)(n-  (k+nH-l)/2)  )1 
is  an  integer.) 

The  following  tableau  shows  E , and  E vAien  n «:  I6  and 

n^K,  m n^m 


k,m 


k < m : 


31 


t 


k = 1 

k = 2 

II 

k = 1+ 

k = 5 

k = 6 

k = 7 k = 8 

E 

n,m 

m = 1 

It. 158 

16.000 

m = 2 

0.976 

0.291+ 

I+.I58 

m = 3 

0.UU9 

0.11+8 

0.079 

1.951 

m = 4 

O.27U 

0.095 

0.052 

0.035 

1.191 

m = 5 

0.198 

0.071 

0.059 

0.027 

0.020 

0.81+6 

m = 6 

0.160 

0.058 

0.033 

0.022 

0.017 

0.011+ 

0.665 

m = 7 

O.lkl 

0.052 

0.029 

0.020 

0.011+ 

0.011 

0.008 

0.565 

m = 8 

0.155 

O.OI+9 

0.027 

0.018 

0.013 

0.009 

0.006  0.002 

0.511 

m = 9 

0.155 

0.01+8 

0.026 

0.017 

0.011 

0.006 

0.005 

0.1+87 

m = 10 

O.lltO 

0.050 

0.026 

0.015 

0.008 

0.003 

0.1+85 

m = 11 

0.156 

0.053 

0.026 

0.013 

0.001+ 

0.501+ 

m = 12 

0.182 

0.058 

0.021+ 

0.008 

0.51+3 

II 

S 

0.22U 

0.061 

0.017 

O.60I+ 

H 

II 

0.290 

0.056 

0.692 

m = 15 

O.UO7 

0.811+ 

Note  that 

2,12  ^ ®16,2,13  ^ 

%6,2,11+  » 

the  values  of  E_  are 

n,  k,  m 

not  convex  in  general. 

The  true 

average 

costs 

for  n = 

16  are  51.120 

and  20.352  ; thus  the 

Monte  Carlo  results  appear  to  be 

valid. 

*i 
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9.  Another  Model  for  Average  Cost. 


We  might  also  wish  to  study  the  average  behavior  of  an  equivalence 
algorithm  under  the  assiamption  that  the  operations  consist  of  the  edges 
of  a randcm  spanning  tree  in  random  order;  thus,  we  assume  that  the 
n (n-1) 1 possible  sequences  of  union  operations  of  the  form 
" merge  R[y^] } ; ; merge  ^ " are  equally  likely. 

The  difference  between  this  model  and  the  previous  one  can  be  seen 
in  the  case  n = 4 : There  are  12  spanning  trees  which  form  a 

hamiltonian  path  (type  1),  and  4 which  form  a "star"  (type  2 ).  After 
creating  the  first  coiT5)onent  {a,b]  of  size  2,  the  new  algorithm  will 
create  a disjoint  second  conq)onent  {c, d}  with  probability  1/5  if  the 
tree  is  to  be  type  1 , and  never  if  it  is  to  be  type  2 , hence  the  overall 
probability  is  1/4  that  two  disjoint  conqjonents  of  size  2 are  formed. 
The  random  process  we  have  studied  above,  however,  will  create  {c, d} 
with  probability  1/5  , since  [c,  d}  is  only  one  of  five  inequivalent 
pairs  that  might  fire  next.  The  new  model  is  qualitatively  different  from 
the  old  because  it  makes  the  merging  of  two  large  conqjonents  significantly 
more  probable;  thus,  we  would  not  expect  the  weighted  rule  to  give  such 
a substantied  improvement  over  the  unweighted  nxle  when  using  this  model. 

The  random  spanning  tree  model  has  been  studied  by  A.  C.  Yao  [12]; 
we  shall  analyze  it  in  a some^at  different  way,  so  that  its  similarities 
and  differences  with  respect  to  the  random  graph  model  are  clarified. 

In  the  next  few  sections  we  shall  use  the  symbols  m 

to  represent  quantities  in  the  new  model  analogous  to  those  in  the  old; 

in  other  words,  Eji  is  the  expected  number  of  classes  of  size  k 

formed  during  the  algorithm,  and  E_  , ^ is  the  expected  number 

m 


of  times  we  merge  a class  R[x]  of  size  k with  a class  R[y]  of 


size  m . Note  that  we  must  have 


(9.1) 


S E, 


Kk<  f 


n,  k, /-k 


in  both  models  when  / > 1 , since  every  class  of  size  >1  is  obtained 
by  merging. 

In  the  new  model  the  ratio  i-k/^  I independent  of  n , 

since  the  1-1  unions  which  form  a class  of  size  t do  not  affect  the 

behavior  of  other  unions.  More  precisely,  consider  any  subset  A of  f 

elements,  and  any  sequence  of  unions  in  which  A is  formed.  Then  we  can 

i-2 

replace  the  ;-l  unions  forming  A by  any  of  the  t (f-l)l  such 

sequences,  obtaining  in  this  way  all  sequences  of  n-1  union  operations 
in  which  class  A is  formed  and  the  n-l  other  unions  are  held  constant. 
It  follows  that 

=n,k,,-k/\,,  = ®/,k,l-k  - 


SO  we  must  only  determine  the  numbers  E , and  E , , in  the  new 

n,  k n,  k,n-k 

model  in  order  to  deduce  n1 1 the  E.  , values. 

n,k,  m 

To  determine  E v „ v.  > consider  how  many  sequences  of  unions  end 

n*iv 

by  merging  R[x]  with  R[y]  , where  class  R[x]  is  a particular  set  A 

k-2 

of  size  k . There  are  k (k-l)J  sequences  of  unions  which  construct  A , 
(n-k)  (n-k-l)I  sequences  of  unions  which  connect  up  the  other  elements, 
ways  to  intermix  these  sequences,  and  km  unions  which  could 
come  last,  hence 

=„,k,„-k  • f ( ;)k'“=(k-i);{k-k)''-''-=(„.k-i);(":= 

= 2 (n-l)  (k)(jr  ) ( V ) 


4 


(As  in  Equation  (3.2)  we  must  include  a factor  of  1/2  because  of  the 
syinmetry  between  x and  y . ) Note  that  for  fixed  k and  i , the 


I 

i 


i 

■'i 

I 


asymptotic  ratio  of  f-k^^n  £ n - “ in  our  former  model 

approaches  E , „ v ^ exact  ratio  of  E , ,/E  in  the  present’ 

model,  by  Equation  (5.6)  and  Theorem  3(a).  Therefore  the  new  model 
essentially  reflects  the  "local"  behavior  of  the  former  model  on  small 
conq)onents.  Alternatively  we  can  regard  the  spanning  tree  model  as  an 
indication  of  the  "early"  beha'vior  of  the  former  model,  since 


^£,k,  £-k 


lim 
T -.0 


^n,t, 


f 


vdiere  the  quantities  on  the  ri^t  are  obtained  by  substituting  T for  as 
in  (3.3)  and  (7.2). 

Let  p,=E  , , be  the  probability  that  the  final  union  is  a 

Ti^  H"K 

(k,n-k)  -merge,  and  let  be  the  average  total  cost  of  unions 

in  the  wei^ted  and  vuiwei^ted  equivalence  algorithms,  respectively.  The 
independence  argument  by  which  we  established  (9.2)  shows  also  that 


(9.U) 


,QFW 


Z 

0<k< 


n 


QjFW 


n-k^ 


(9.5) 


Z 

0 < k < n 


+ C 


QF  ^ pQf 


C'^v) 
n-k^ 


> 


because  the  behavior  of  the  algorithm  within  the  classes  of  sizes  k and 
n-k  is  the  same  as  its  behavior  on  classes  of  total  size  k and  n-k  . 

A.  C.  Yao  [12]  has  proved  that  ^ n log  n , X n^/^  , using  a 

different  approach  to  the  analysis;  by  studying  recurrences  (9.^)  and.  (9.5 )> 
we  will  be  able  to  obtain  more  precise  resiilts. 


4 


i 

I 

i 

f 

j 
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10.  Solution  of  Recurrences. 


According  to  the  equations  we  have  Just  derived,  the  average 
behavior  of  equivalence  algorithms  in  the  spanning  tree  model  can  be 
described  by  recurrence  relations  of  the  general  form 


(10.1) 


where 


(10.2) 


■nk 


Hby(k)(?)  (^) 


.n-k-1 


Before  considering  this  particular  recurrence  in  detail,  it  will  be 

interesting  to  deduce  properties  implied  by  (10.1)  for  any  choice  of 

the  p^j^  such  that  ^k^nk  ~ ^ ^ since  such  rec\irrences  arise  also  in 

the  solution  of  several  other  algorithms  (e.g.,  in  studies  of  quicksort 

and  of  digital  search  trees).  If  c^^  = 1 and  = 0 for  all  n > 1 

it  is  immediate  that  = n for  all  n ; similarly  if  c^  = 0 and 

c„  = 1 for  all  n > 1 we  have  x = n-1  for  all  n . In  general  x 
n n n 

is  a monotone  function  of  (c^, ...,c^)  , hence  these  particular  solutions 
allow  us  to  conclude  that 

(10.3)  = 0(1)  implies  x^  = 0(n)  . 


Let  us  now  specialize  (lO.l)  to  the  case  that 
(10.1+)  p^j^  = r(k)r(n-k)/s(n) 

for  some  functions  r and  s , where  r(n)  = 0 for  n < 0 and 

(10.5)  s(n)  = S r(k)r(n-k)  . 

k 

Clearly  (10.2)  has  this  form,  with  r(n)  = n > 1 , and 

s(n)  = 2(n-l)n'^  ^/n'.  . When  p^j^  = we  can  replace  (10.1)  by 


10.6)  X 


0<k<n 


If  we  can  find  sequences  <x^)  such  that  ^ simple  form, 

we  can  insert  the  corresponding  values  into  (10.6)  and  obtain  a sequence 
(c^)  with  a known  solution  5 linear  combinations  of  these  special 

sequences  (c^^)  can  then  be  used  to  obtain  many  further  solutions.  The 
form  of  (10.4)  suggests  that  we  try  = r(n-m)/r(n)  for  some  fixed 
nonnegative  integer  m ; then  we  have 

^^nk^  s(n-m)/s(n)  , 

k 

fm) 

hence  x^  = r(n-m)/r(n)  is  the  solution  to  (10.6)  when 

(10.7)  c . c<”>  . - 2 . 

''  ' ' n n r(n)  s(n) 

If  r(n)  ^ 0 for  n > 1 , we  can  obtain  any  sequence  (c^)  as  a (possibly 

/ \ 

infinite)  linear  combination  of  the  special  sequences  (c^  ) , since 

c^'^^  = 0 for  n < m and  = r(l)/r(nH-l)  ^ 0 ; the  solution  to  (10.6) 

will  then  be  the  same  linear  combination  of  the  sequences  (x^  ')  . 

In  our  case  (10.2),  we  find  for  exafl5)le  when  m = 1 that 
= (l-l/n)*^  ^ solves  (10.1)  when  c^  = (l  - l/n)"^  ^(2/(n-l)^  - 1) 
for  n > 2 . However,  this  general  approach  does  not  seem  to  lead  to 
sufficiently  simple  formulas,  so  we  shall  now  restrict  consideration  to 


the  particular  case  (10.2),  when  more  powerful  techniques  can  be  used. 
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11.  Solution  of  the  Spanning  Tree  Recurrence. 

Let  us  assume  that  = 0 , since  we  have  alreacJy  determined  the 
dependence  of  on  c^  . When  is  given  by  (10.2),  we  can  multiply 

XI  1 

both  sides  of  (10.6)  by  (n-l)n  /nl  , obtaining 


(11.1) 

where 

(11.2) 


t T \ *^-l 
(n-1 ^n  X 


n; 


= d + n L — TT 

0<k<n 


vk-l  , 

\ (n-k)^-^-^ 


(n-k): 


d = (n-l)n'^”^c  In', 
n ^ ' n' 


The  form  of  (11. 1)  suggests  that  we  introduce  the  generating  fxmctions 


n-1 
n X 


(11.3) 

G(z) 

n >2 

(11.1) 

F(z) 

n-1 

V 

, nl 

n > 1 

(11.5) 

D(z) 

= E d z"" 
_ . ^ n 

n n 
z , 


and  we  obtain  the  equivalent  relation 


(11.6)  G' (z)  - z‘^G(z;  = z'-^D(z)  + ^ (F(z)G(z)) 


-1. 


-1. 


= z D(z)  + F'(z)G(z)  + F(z)G’(z)  . 

It  is  well  known  (see  e.g,  [U,  p,  592])  that  this  particiiLar  function  F(z)  satisfie 

(11.7)  F(z)  = ze^^^)  ; 

hence 

F(z: 


We  can  now  multiply  (11.6)  by 


(11.9) 


1/F(z) 

D(z) 

zF(z) 


and  rewrite  it  as 


the  solution  vd.th  = 0 xs 

(11.10)  G(z)  = F(z)  ^ D(w)dw 

1 - F(^z)  Jq  wF(^w) 

Let  us  now  imitate  our  procedure  of  the  previous  section,  finding  a 
set  of  functions  such  that  the  integral  in  (11.10)  has  a simple 

form  and  then  expressing  the  general  case  as  a linear  combination  of  these 
special  ones.  It  is  natural  to  set 

(11.11)  D^(z)  = zF(z)""f'(z)  = F(z)”^V(1-F(z))  5 

then  the  corresponding  generating  function  is 


(in  other  words,  ^ eigenfunction  of  the  linear  mapping  D f-  G 

defined  by  (11.10),  with  eigenvalue  1/m  .)  To  find  the  power  series 

expansion  of  y ^se  Lagrange's  general  inversion  form.ula, 

according  to  which  the  relations  z = tf(t)  = t+  f^t  + fgU  + ...  and  . 

2 2 

l+Wj^z+WgZ  + ...  = g(t)  = 1+gjt+g^t  + ...  imply  that  nw^  is  the 
coefficient  of  t*^  ^ in  g'  (t)f(t)  ^ . Letting  t = F(z)  , f(t)  = e”^  , 
g(t)  = t'^V(l-'t)  , we  obtain  nw^  = n^(n-k)/k:  = n^'®/(n-m-l);  , 


VJ 


hence 


for  n > 2 . 


n-m-1 


Tlie  corresponding  c ' s,  according  to  (11.2),  are  given  by 


(11.13)  c = c 


(m) 


(n-2) 1 _ n-2  n-m 

n 


-1  \ ■ 

n (n-m-1); 


V/e  have  proved  the  following  result; 


Lemma  3.  Let  m be  a positive  integer.  The  solution  to  (10. 1),  (10.2) 


(ll.lli)  X = 


m n 


n 


\dien  “ n"^  sequence  defined  in  (11.13).  U 


In  order  to  translate  Lemma  3 into  a more  useful  form,  let  us  write 
(cf.  [^]) 


(11.15)  Q<aQ,a^,a2,...)(n)  = a^  + a^  2^  + a^  2^  ^ + ... 


n-1  n-2 


By  successively  setting  n = 1,2,3, ...  in  this  formula  we  see  that  an^ 
function  of  the  positive  integer  n can  be  written  as  Q<aQ,  a^^,  a.^,  . . . ) (n) 
for  some  sequence  (Uq,  a^,  a^,  . . . > , and  if  we  are  lucky  the  a's  will  form 
a nice  pattern. 

Suppose  c^  = Q(aQ,  a^,  a^,  . . . ) (n)  where  a^  = 1 and  all  the  other  a,, 
-•ire  zero.  We  have 

rn  T > “"2  f'""'  "1  (ra)  ^ 1 (m+1) 


■ ;o  the  solution  must  be 


)(0 


(11  11^  ‘ili  - 


n-1  n \ n-1  n-2  n-m 


note  that  this  works  also  when  m = 0 . Therefore  Lemma  5 can  be  rephrased 
as  follows: 


Corollaiy.  The  solution  to  (10. 1),  (10.2)  when  = Q<aQ, a^^ a^,  . . . ) (n) 


(11.18)  ■ X , = ( 


1+1 


1;?’.  Aprlicatlon  to  the  Spanning  Tree  Model. 


Let  us  now  use  the  results  of  the  previous  section  to  determine  the 
average  behavior  of  the  spanning  tree  model.  First  we  shall  study  some 
special  cases  of  the  general  Q function  defined  in  (11.15).  It  is  not 
difficult  to  verify  that 

(12.1)  Q(l,2,3, ...)(n)  = n ; 
furthennore 

(12.2)  Q<l,l,l,...>(n)  = Q(n)  = 7^  - J + 0(n-l/2) 

is  the  function  discussed  in  (2. lit).  Let  us  now  write  Qq(h)  = n , 

Qj_(n)  = Q(n)  and 

(12.3)  J’  ...^(n)  = 02(n)  , 


M.  D.  Kruskal  has  proved  [7]  that 

(12.5)  02(n)  = I In  n + I (r  + ln  2)  + o(l)  , 

and  it  is  obvious  that 

Q^(n)  < 1 + -^  + i + ...  = 0(1)  . 

^ 2 3 

According  to  Equation  (11, I8), 

(12'. c^  = Q^(n)  implies  = (n-l)Qj^3_(n)  + • 

Combining  this  with  (IO.3)  and  the  above  estimates,  we  see  that 


1+2 


(12.7)  c = a4n  + 0(l)  implies  x = n In  n + 0(n)  , 

" V2it 

for  any  constant  a , since  c^  = (2a/*/^  )Qj_(n)  + 0(l)  . Similarly  we 
can  improve  (10. 5)  to 


(12.8)  c^  = 0(log  n)  implies  = 0(n) 


For  the  unweighted  algorithm,  we  have  c^  = n/2  for  n > 2 (cf.  (9.5)) 
hence  the  average  cost  of  unweighted  unions  can  be  expressed  in  "closed 
form"  as 

(12.9)  cf’  = I (n-l)Q(n)  + I nQ^(n)  - | n 

= + |nlnn  + ^^(7  + ln2)  - |)n  + o(n)  . 

For  the  weighted  algorithm,  we  must  sum 


(12.10)  c = E p , min(k,n-k)  , 
0<k<n 


but  this  does  not  appear  to  have  a simple  closed  form.  By  arguing  as  in 


Lemma  1,  we  have 


1 n n , . . 1 n^^^  f t a.  ^ ^ 1 

k IIIIkT«2S(n,k)  = k5/2(n-k)5/2  V U n-k 


hence 


c = 2 E 7=  ^ + 0(1)  . 

0<k<n/2  'sl^n  k ^ (n-k)^/ 


By  iLUler'c  summation  formula. 


0<k<n/2  k / (n-k)^^‘ 


' 1 X ' (n-x)^' 

1 


I if  f * 


dx 


1/2/  372 

X ' (n-x)-^' 


= - + 0(n"5/^)  , 

n \ / 7 


hence  = Vsn/ir  + 0(1)  . Relation  (12.7)  now  yields  the  asymptotic 
behavior  of  the  algorithm  in  the  weighted  case, 

(12.11)  i n In  n + 0(n)  . 

' n n ' 


We  have  proved 


Theorem  1.  The  average  number  of  times  the  QFW  algorithm  changes 
entries  in  its  R table  while  doing  n-1  set  \inions,  under  the  sj^anning 
tree  model,  is  rt  ^ n In  n+0(n)  ; the  (unweighted)  Qf’  algoritlim  makes 
(n/o)^^^n^^^ + 0(n  log  n)  such  changes,  on  the  average.  uJ 


Here  are  the  results  oi'  empirical  tests  analogous  to  those  in  Section  8 


using  the  spanning  tree  model; 


1+ 

4.3 

+ 

0.1 

6.4 

3.4  t 

0.2 

8 

14.5 

+ 

0.5 

18.3 

9.0  t 

0.2 

16 

44.2 

+ 

1.9 

51.2 

22.6  t 

0.6 

52 

155 

+ 

9 

ll+l 

52.1  t 

2.2 

ol 

343 

+ 

13 

387 

121.2  t 

2.7 

128 

992 

+ 

47 

1065 

274.6  1 

5.9 

256 

2980 

+ 

210 

2922 

580  t 

9 

512 

7490 

+ 

520 

8058 

1350  1 

21 

.021+ 

221+50 

+ 

1765 

22309 

2837  1 

56 

;oi+8 

56637 

+ 

5980 

6198 1+ 

6175  t 

80 

i096 

169628 

+ 

12930 

172792 

13496  1 

266 

The  true  values  of  (C^ , n = 2 , , 8 , l6  are  respectively 

(1,1)  , (H. 575  , 3.25)  , (111.62  , 8.85)  , (1|1+. 26 , 22.09)  . 

If  we  set  c = 6 , in  recurrence  (10. l),  the  resulting  value  of 
n.  riic 

X will  be  E , , the  average  number  of  classes  of  size  k . Hence  the 
n n,  k ' 

general  solution  to  (10. 1),  (10.2)  can  be  written 

(12.12)  X = E c,  E , . 

' ' n , k n,  k 

k ’ 


We  shall  complete  our  study  of  the  recurrence  by  determining  E^ 
fixed  m > 2 , using  the  methods  of  Section  11. 


, for 


According  to  (11.5)  and  (11.10)  we  have 


(12.13)  G(z)  = r 


i:  F(w) 


dw 


This  integral,  can  be  evaluated  by  using  the  known  formula  (cf.  [U, 
exercise  2. 5. 4. U. 29]) 


n-l-r 


(12. lU)  F(z)^  = r E 2^  , r ^ 0 ; 


the  integral  becomes 


(12.15) 


m-2  n n+m 

m v'  n z 

(m-2) 1 “ (n+1) 1 n+m 

n > -1 


1 ^ V'  n^(n+2)(n+3)  ...  (n+m-1)  n+m 

nL  («)•  ^ 


VJe  wish  to  write  the  latter  term  as  a linear  combination  of  the  function 


z"V(z)”^  , for  1 •'  k . m ; thus,  we  set 


^ n > -m  ^ ' 


1 b z'V(z)' 


1 < k 'm 


^ (n+m);  ^ kbj^(n+k+l)  ...  (n+m)  ) z"” 

n>-m  \l<k<m  / 


and  the  b 'c  must  satisfy 


b,(n+2)  ...  (n+m)  + 2b  n(n+3)  ...  (n+m)  + . . . + (m-l)b  .n"'"^(n+m)  + mb  n 

d m-1  ' m 


(n+2)(n»-3)  ...  (n+m-1) 


In  particular, 

- T - l)" 


For  fixed  m as  n -•  = we  have 


1*7 


1^ . Union  Trees 


In  order  to  analyze  a variety  of  equivalence  class  algorithms  in  a 


variety  of  models,  we  can  construct  an  extended  binary  tree  which  retains 


essentially  all  of  the  necessary  infonnation  about  the  set  union  operations 


which  caused  classes  to  merge.  Given  a sequence  of  ordered  pair 


) such  that  the  unordered  pairs 


form  a spanning  tree  on  the  vertices  [1,2, 


union  tree  be  defined  as  follows:  For  1 < j < n , construct  a new  node 


whose  left  subtree  is  the  union  tree  for  the  current  con^ionent  of  x 


and  whose  right  subtree  is  the  union  tree  for  the  current  canponent  of  y 


(By  "current  component"  we  mean  the  connected  component  defined  by  the 


) The  union  tree  for  a coitq^onent 


of  size  1 is  a single  terminal  node 


Thus,  for  example,  the  union  tree  associated  with  the  sequence 


(The  labels  shown  on  these  terminal  nodes  are  not  really  part  of  the  tree 


they  merely  help  to  indicate  the  manner  of  construction.)  Note  that  the 


union  tree  has  been  defined  for  ordered  pairs 


r 


oi‘  the  c.x:i!:i].le  were  (7?^)  instead  of  (*+,?)  the  tree  woiild  be  different. 
This  eonvention  about  ordered  pairs  avoids  complications  that  would  other- 
wise ai’ise  wiien  counting  binary  trees  whose  left  and  right  subtrees  are 
isonoiT-hic. 

We  can  extend  the  models  of  random  behavior  used  above  to  obtain 
definitions  of  random  union  trees  by  assuming  that  each  edge  0<jy} 
occ\irring  in  the  random  graph  or  random  spanning  tree  is  equally  likely  to 
appear  as  (x, y)  or  as  (y^x)  when  the  corresponding  union  tree  is  being 
built  up.  Then  each  of  the  (2n-2) :/nI (n-1)  I possible  binary  trees  with 
:i  terminal  nodes  will  occur  with  a certain  probability.  For  example, 
when  n = ^4  the  five  possible  iinion  trees 

^ ^ \ 

each  occur  with  probability  l/5  in  the  random  graph  model,  while  the 
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lT’l^’'!+'ir'ir/  spanning 

tree  model . 

The  ]^roL ability  ol'  a ] articular  tree  T can  be  calculated  in  the 
random  graph  mcxiel  by  considering  the  function  P(T,  t)  which  denotes  the 
j.robability  tiiat  T has  been  i'ormed  at  time  t . Let  |t|  be  the  number 
of  term.inrxl  nodes  of  T ; and  if  I'’!  >1  let  T^  and  be  the 

respective  !■. t't  and  right  subtrees  of  the  root,  so  that  ~ " l^i  . 

When  jT|  =-  1 we  define  }(T,t)  = 1 , otherwise  we  let 


Then  P(T,oo)  is  the  probability  that  T is  formed  by  the  algorithm. 

For  example,  when  T is  the  middle  tree  of  (13.2)  it  can  be  shown 


P(T,t)  = F - 3e"^^  + ^ e"5^-2e"^^ 
? 5 


but  for  the  other  four  trees  we  have 

P(T,t)  = 1 - e-5t  * 2 e-5t  - . 

The  sum  of  P(T,t)  over  all  five  trees  T is,  of  course,  Pj^(t)  . 

Although  all  five  trees  will  occur  with  probability  1/5  , the  middle 
tree  tends  to  occur  "faster"  when  it  does  occur,  since  the  middle  function 
is  (e  - e larger  than  the  others. 

Let  T^  be  the  tree  with  ~ 1 , and  let  T^  be  the  tree  with 

|Tj^j  r--  n whose  right  subtree  is  T^_2^  ; thus  T^  is  a "degenerate" 
tree,  having  the  longest  path  length  over  all  trees  with  n terminal 
nodes.  For  these  special  trees  an  inductive  argument  can  be  used  to  express 
the  p function  as  a fairly  simple  sum, 

(13. M KT  ,t)  . r (-i)"  "■<":})•  ,-k(pn-i-t)t/s  _ 

n 0<k<n  K.  ^.:fn-l-Kj. 


Curiously  we  have 


(15.5)  P(T^,»)  = n;(n-l):/(2n-2);  , 


which  is  the  exact  reciprocal  of  the  total  number  of  binary  trees;  in  other 
words,  the  degenerate  tree  occurs  just  as  often  as  it  would  in  a uni  form, 
distribution  over  trees. 
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Lfti fortunately  the  probabilities  P(T, »)  for  other  trees  do  not 
have  such  simple  properties,  and  for  n > 4 the  distribution  becomes 
far  from  uniform.  Computer  calculations  for  n = 10  show  that  the  tree 


(15.6) 


has  maximum  probability  over  all  l8;/lO;9i  = 4862  binary  trees  with  10 
terminal  nodes;  its  probability  is  74615232/35942281  times  1/4862  . 

The  least  probable  trees  are  obtained  by  joining  two  degenerate  's; 
their  probability  is  only  8515905/27199564  times  1/4862  . According 
to  results  we  have  already  derived,  a tree  whose  left  subtree  has  nearly 
n/2  terminal  nodes  will  almost  never  occur  for  large  n . 

The  tree  probabilities  in  the  spanning  tree  model  are  much  simpler. 

Let  o(T)  be  the  set  of  all  n-1  nonterminal  subtrees  of  T , when 

1t|  = n ; then  it  is  not  difficult  to  prove  that  T occurs  in  the  spanning 

tree  model  with  probability 


(13.7)  1(T) 


— — rr  _llL 

(2n)'"‘"^  TeS(T)  hl"l 


For  the  probability  is  clearly 


IT, 

T e S(T) 


s( 


using  the  notation  of  (10.4);  and  r(n)/s(n)  = n/2(n-l)  . 
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Incidentally,  whenever  the  probability  distribution  for  trees  has 
the  "separable"  form 

(13.8)  P(T)  = f(lTl)  J]  g(|T|) 

T e S(T) 

for  some  functions  f and  g , we  can  use  recurrences  like  (10. 1)  satisfyin 
property  (10.1)  to  analyze  cost  functions  on  the  trees.  Three  examples  of 
such  probability  distributions  appear  in  [5,  exercise  6.5-56]. 

Once  we  know  the  tree  probabilities,  we  can  analyze  several 
equivalence  algorithms.  The  cost  of  tree  T in  the  QFW  algorithm  is 

(15.8)  C^V)  = r min(lT  1,1t  |)  , 

t r S(T) 

and  in  the  uriweighted  algorithm  it  Is 

(13.10)  C^(T)  = 7)  . 

Vvhen  the  probability  model  assigns  equal  probabilities  to  (x, y)  and  (y,  x) 
CO  that  all  trees  obtainable  from  a given  tree  by  interchanging  left  and 
ri^t  subtrees  are  equiprobable,  (I5.IO)  can  be  replaced  by  one-half  the 
external  path  length  of  T , i.e., 

(15.11)  C®'^(T)  = i Z |t|  , 

T r D(T) 

because  |tJ  mil  be  | ( 1 1 + I P = | P|  on  the  average.  The 

quantity  (13.II)  will  have  the  same  mean  as  (I3.IO),  but  not  the  same 
variance. 

A.  C.  Yao  [12]  has  analyzed  two  other  algorithms  which  he  calls 
"quick  merge"  and  "quick  merge  weighted".  It  is  not  difficult  to  see 
that  we  can  study  the  length  of  "find"  operations  on  the  merge  steps  of 


these  algoritlims  by  considering  union  trees,  using  the  respective  costs 

(13.12)  c'^^(T)  = E c'^(t)/  , 

T e S(T) 


(13.13; 


C^(T)  = 


E c^(-^)/hi 

T e s(t) 


provided  that  the  probability  model  we  are  using  assigns  equal  probability 

to  all  sequences  (x, , y,  (x  , , y ,)  in  which  (x.,y.;  is  replaced 

-LX  n X ti'”X  j j 

ly  >y'-)  f where  x'.  and  y'.  are  in  the  same  current  components  as 
J J d d 

x.  and  y.  . Both  of  the  models  we  are  considering  have  this  property; 
d d 

in  the  random  graph  model  these  formulas  do  not  account  for  "find" 
operations  when  a redundant  edge  is  encountered.  In  the  spanning  tree 
model  we  can  obtain  the  average  behavior  of  these  two  algorithms  by  solving 
the  recurrences 

= , 

- nk  k 


(13.15) 


'n 

C*^/n  + 2 
n ' 

E 

0 ^ k 

. n 

C^/n  + 

2 

V' 

n 

n ' 

0 ^ 

k 

Pk^r 

nk  k 


as  in  Section  12  above.  From  (12.71,  (12.8),  and  Theorem  h we  may  conclude 
that  ^ n In  n+0(n)  and  _ o(n)  , thereby  confirming  and 

slightly  sharpening  Yao's  resiilts. 

Doyle  and  Rivest  [2]  have  studied  equivalence  algorithms  under  a third 
juobability  model,  assuming  that  each  union  takes  place  between  a random  pair 
of  equivalence  classes  present  at  the  time,  regardless  of  the  sizes  of  these 
classes.  Although  their  model  may  be  unrealistic,  it  is  interesting  to  note 
that  it  leads  to  union  trees  with  the  same  probability  distribution  as  that  of 
binar:/  search  trees;  cf.  [5,  Section  6.2.2].  For  example,  the  five  union 
tr-ios  in  (lj.2)  have  the  exj;ected  juobab  1 lities 
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this  model.  Since  the  first  union  leaves  classes  of  sizes  (2,1, .,,,1)  , 
and  since  the  subsequent  behavior  of  the  algorithm  is  to  construct  a 
random  union  tree  from  these  n-1  classes,  it  is  clear  that  random  union 
trees  with  n terminal  nodes  are  obtained  from  those  with  n-1  by- 
replacing  a random  terminal  node  by  a branch  node,  and  this  is  essentially 
the  same  process  which  produces  random  binary  search  trees.  We  can  analyze 
the  four  union  algorithms  in  this  model  by  using  Equations  (9-^),  (9.5), 
(15. lU),  and  (15.15)  with  the  separable  probability  distribution 
p^j^  = l/(n-l)  . The  resulting  solutions  are 

(13.16)  = n(H^-l)  = nlnn+0(n)  ; 

= nHn  - - rn/2  l = | n In  n+o(n)  ; 

= 2nH^2)  _ 2^  _ + 1 = Q m2-p^n+0(log  n;  ; 

= 0(n)  . 

liote  that  in  this  model  the  union  tree  tends  to  be  reasonably  well- 
balanced,  so  the  weighted  algorithm  saves  only  a factor  of  2 . 


Il4.  1^'on  Iroblem::. 


We  have  proved  Uiat  the  QFW  algorithm  has  linear  expected  running 
time  in  the  random  graph  model,  and  we  have  analyzed  four  distinct 
algoritlims  in  tlie  other  models,  but  several  related  questions  are  still 
waiting  to  be  resolved. 

ferhapE  the  most  imj)ortant  problem  remaining  is  to  determine  the 

^2  -1. 

asymptotic  behavior  of  ^ when  n ' ^ ^ ^ ^ ? since  our  estimates 

are  suisatj  s factory  :in  tin  s interval.  Such  an  improvement  should  help  in 

the  analysis  of  many  oth(?r  algorithms,  because  the  function 

describes  t.he  behavioi’  of  random  graphs.  A detailed  knowledge  of 

v/ould  ] I’obably  t*stablisli  the  conjecture  (7.7 ),  and  perhaps  it  would  also 

l';ad  to  an  analyt-ic.  detenninution  of  the  constant  lim  (C^^^/n)  . 

n ->co ' n 

Given  random  inj  ut  sequences  of  length  £ in  the  random  graph  model, 

Is  it  tru-  that  the  c:q  ected  running  time  of  algorithm  QPW  is  0(i)  ? Our 
proof  gives  0(£<-n)  , which  is  satisfactory  if  £ is  order  n at  least; 
and  I'or  very  small  £ the  individual  components  almost  always  have  boianded 
size.  But  for  £ ^ n/log  n , say,  we  do  not  know  how  to  answer  this  question. 

Another  natural  j roblem  the  authors  have  not  been  able  to  resolve  i s 
th-;  ostirnathin  of  1(7,3;'^  for  given  trees  T . This  ought  to  shed  further 
light  on  equivalence  -ILgorithms  and  the  connectivity  of  random  graphs. 
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